Climate change is a big problem. Like, a really big problem. According to the World Health Organization, climate change is already estimated to kill over 150,000 people every year, and it is a virtual guarantee that the situation will worsen. Naturally, the conversation around this catastrophe is often through a very large lens. “Will the United States rejoin the Paris Climate Agreement?” “Does it even matter if China and India’s greenhouse gas emissions continue at these rates?” We’ve all heard these narratives and understand their importance. To the extent that climate change is discussed, these big picture items dominate the public discourse. This singular focus on the enormity of the challenge we face is an equally enormous mistake.
Large scale introduces strong friction, and as such these points of discussion are often unwieldy and frustratingly unproductive. The public has been having this big idea discussion - again, to the extent that it has been having any serious discussion regarding climate change - for decades. Our progress towards solutions has been minimal. Perhaps it is time for a different a tact. Perhaps it is time to think small.
While national and global action capture the imagination, local progress represents an arena for real, steady gains in this struggle. If we all refocused some of our attention on the ways in which our closest communities could contribute to a solution, specific, concrete strategies would lead to legitimate gains. Get enough grains of sand, and you have a mound. Do so quickly, and you can build a dune.
To be clear - climate change will require continued attention on the aforementioned big picture problems. A global crisis requires global solutions, and we in no way advocate for abandoning these discussions. But, we have to start somewhere. And we have to start now. Local action can grant greater autonomy to individuals, offer proof of concept for larger-scale solutions that will be necessary, and bring swifter advancement. Parallel tracks of focus will be necessary moving forward.
So, in attempt to make immediate progress on an immediate problem, we encourage readers to think local. As New Yorkers, we will focus our paper on New York City (you know a problem is tremendous when New York constitutes small) and New York State. Specifically, we aim to take stock of how sustainable an area New York is currently, identify areas for improvement among private citizens and through public policy, and to explore ways in which these arenas can interact. We begin with an overview of publicly available information on the topic.
On the city level, our key data source is NYC Open Data - located here. This site offers a library of datasets filled with municipal-government-agency-collected data.
To begin, we wanted to understand how well NYC recycles, so we turned to a simple recycling diversion and capture dataset. The data shows diversion and capture rates (as well as a breakout of the latter) for each zone and district in each borough for each month of 2019. It has approximately 2,800 observations.
In order to take stock of current emission factors, we rely mostly on two key datasets. First, we use an inventory of sources of greenhouse gas emissions from fuel emissions in NYC. The most crucial variables available are sector (e.g. natural gas, propane), source (e.g. stationary, on-road mobile) and CO\(_2\) emission rates. Due to the grouped nature of the data, the set itself is quite manageable, with only 24 observations.
The second key dataset for understanding New York City greenhouse gas emissions contains energy and water disclose data for buildings - specifically, buildings covered by local law 84. The dataset contains variables such as zip code, BBL (a uniuqe property identifier), total square footage, kilowatt hours of electricity consumed and total metric tons of greenhouse gas emissions for buildings. The set has approximately 29,000 observations. One immediately apparent issue was the presence of negative GHG emissions for one observation. It seems to be an obvious error and is filtered out. Further Issues regarding the data will be discussed in relation to a second dataset, as well as in section III and IV.
Also crucial for understanding New York City buildings was a primary land use dataset, which also contained BBL and key building measurements for understanding 2D square footage of buildings (i.e. square footage on a single floor). This dataset has over 850,000 obvservations. This highlights a key shortcoming of the preceding dataset referenced, as it offers only a limited sample of buildings documented. Further, it is far from a random sample, as Local Law 84 covers buildings only reaching specific thresholds of square footage. In order to consider the city as a whole, the energy use dataset has to be used in order to make estimations more widely.
Lastly, we also made use of a non-Open-NYC dataset to access its variables mapping zip codes to boroughs. It has 177 entries.
On the state level, we rely on the state-level counterpart to Open NYC data.ny.gov. This site offers a similar depth and breadth of datasets available. For our purposes, it was particularly useful for understanding the current state of vehicle usage (e.g. greenhouse gas emissions from cars, how many electric vehicles are in circulation, etc.).
Firstly, we make use of data on greenhouse gas emissions from fuel combustion. This dataset is very similar to the city-level one, variables representing metric tons of greenhouse gas emissions from sources such as residential, transportation and commercial activity, with the added benefit of a longer time period represented by the year variable. It is also of similar size to the previous set, 27 observation. This dataset is state-level but also crucial for understanding the city-level data we had, as it demonstrated key contrasts between the city and state.
From here, we relied heavily on NYSERDA data - a subset of the entire state’s data library dedicated to understanding energy research and development.
Of initial interest is a provided dataset on the state’s status in clean energy programs. We use this dataset’s variables such as program name, direct participants acquisitions planned, and direct participants acquired to date in order to the planned scale and current effectiveness of the state’s electric vehicle rebate program. The set has approximately 23,000 observations.
In concert with the previous dataset listed, we also used one dedicated to the clean rebate program. This dataset has variables such make, model, rebate, and zip code. While the previous dataset was crucial for understanding overall program effectiveness, this one gives a more granular view and provides insights into where and how the program is or is not working. It has approximately 37,000 observations.
Lastly, while our focus was local, benchmarking is important. As such, we made use of global electric vehicle data in order to provide insights into how we are pacing in terms of EV use growth. NOTE - IDK HOW THIS WAS USED FOR GLOBAL, CHECK WITH MELISSA.
To begin, much of our key electric vehicle data was formatted with inconvenient variable names, so we made use of a custom function to reformat them into more usable forms. In addition, we formatted date data and made use of a combination of piping and grouping to get counts for rebates:
library(tidyverse)
library(base)
library(zoo)
library(lubridate)
# from resources.xlsx
ev_charging_use_data = resources_sheets$`Charging Use`
# use clean_rebate_data instead - more up to date but same thing
# ev_rebate_data = resources_sheets$`Drive Clean Rebate`
# from EV-Registration-Tables.xlsx
ev_zip_code_data = ev_registration_tables_sheets$`Current by ZIP Code`
ev_make_model_data = ev_registration_tables_sheets$`Current by Make-Model`
ev_county_data = ev_registration_tables_sheets$`Current by County`
ev_over_time_data = ev_registration_tables_sheets$`Original Over Time`
ev_make_over_time = ev_registration_tables_sheets$`Original by Make`
cleanCols <- function(x){
lower_names = tolower(names(x))
sub_periods = gsub("\\.", "_", lower_names)
sub_spaces = gsub(" ","_",sub_periods)
clean_df_cols = gsub("__","_",sub_spaces)
return(clean_df_cols)
}
names(global_electric_car_sales_data) = cleanCols(global_electric_car_sales_data)
names(energy_programs_data) = cleanCols(energy_programs_data)
names(emissions_from_fuel_data) = cleanCols(emissions_from_fuel_data)
names(fuel_emission_factors_data) = cleanCols(fuel_emission_factors_data)
names(clean_rebate_data) = cleanCols(clean_rebate_data)
names(ev_registrations_data) = cleanCols(ev_registrations_data)
#names(federal_cvrp_stats) = cleanCols(federal_cvrp_stats)
#NOTE - IDK WHAT IS GOING ON WITH CVRP LARRIES
#names(ev_rebate_data) = cleanCols(ev_rebate_data)
#NOTE - I DON'T SEE AN EV REBATE LARRY IN THE DATA SECTIONS
# ev_original_registrations_data = filter(ev_registrations_data, registration == 'Original')
clean_rebate_data$date = as.Date(clean_rebate_data$submitted_date, "%m/%d/%Y")
clean_rebate_data$year = format(clean_rebate_data$date, format="%Y")
clean_rebate_data$year_quarter <- as.yearqtr(clean_rebate_data$date, format = "%Y-%m-%d")
clean_rebate_data$cumulative_rebate_amount = cumsum(clean_rebate_data$rebate_amount_usd_)
# count by quarter
clean_rebate_counts = clean_rebate_data %>% group_by(year_quarter) %>% tally()
# by quarter
clean_rebate_data_by_quarter = aggregate(clean_rebate_data$rebate_amount_usd_,
by=list(clean_rebate_data$year_quarter),
FUN=sum)
#IDK WHY BUT YEAR_QUARTER WAS LOSING ITS NAME - ADDING HERE
clean_rebate_data_by_quarter$year_quarter = clean_rebate_data_by_quarter$`Group.1`
clean_rebate_data_by_quarter$rebate_amount = clean_rebate_data_by_quarter$x
clean_rebate_data_by_quarter$cumulative_rebate_amount = cumsum(clean_rebate_data_by_quarter$rebate_amount)
#SEEING A WEIRD ERROR ABOUT "REPLACEMENT HAS 0 ROWS, DATAS 17 CALLS
clean_rebate_data_by_quarter$count = clean_rebate_counts$n
Next, we reshape our recycling data to be grouped by borough, using the median zone rates to represent the typical for each borough. We also create a second month variable for ease of access.
df_rec$month = df_rec$`Month Name`
rec_use <- df_rec[c(1, 10, 7, 8, 9, 6)]
rec_use <- rec_use %>% group_by(Zone, month) %>%
summarise(capture_rate = median(`Capture Rate-Total ((Total Recycling - Leaves (Recycling)) / (Max Paper + Max MGP))x100`),
paper_rate = median(`Capture Rate-Paper (Total Paper / Max Paper)`),
mgp_rate = median(`Capture Rate-MGP (Total MGP / Max MGP)`),
diversion_rate = median(`Diversion Rate-Total (Total Recycling / Total Waste)`))
## `summarise()` has grouped output by 'Zone'. You can override using the `.groups` argument.
rec_use$month <- factor(rec_use$month, levels = c("January", "February", "March", "April",
"May", "June", "July", "August", "September",
"November", "December"))
Further, the key dataset used for NYC building energy usage was missing key data. Specifically, not all observations had the borough field populated, and it had nothing on the previously mentioned 2D square footage. As such, the first step in our data transformation was merging the dataset with the zip-boroughs mapping (df_zips, trimmed to be only zip and borough) and land-use datasets (df_area_full) in order to access this desired information:
df_zips <- df_zips[c(1, 2)]
df_area_full$building_area = df_area_full$bldgfront * df_area_full$bldgdepth
df_area <- df_area_full[c(69, 100)]
df_build_energy$zip <- df_build_energy$`Postal Code`
df_build_energy <- merge(df_build_energy, df_zips, by = "zip")
df_build_energy <- merge(x = df_build_energy, y = df_area, by.x = "BBL - 10 digits", by.y = "bbl")
In addition, the key greenhouse gas (ghg) field had to be converted to a numeric for manipulation and interpretation. After this, we reshaped the data for further by grouping by borough and zip code and taking both the mean and median GHG emissions for each. As previously noted, a singul negative value for GHG emissions was found. It is filtered out here, as well.
df_build_energy$ghg <- as.numeric(df_build_energy$`Total GHG Emissions (Metric Tons CO2e)`)
df_build_energy = filter(df_build_energy, ghg >= 0)
df_ghg <- df_build_energy %>% group_by(borough, zip) %>% summarise(avg_ghg = mean(ghg),
median_ghg = median(ghg))
df_ghg$place <- paste(df_ghg$borough, df_ghg$zip, sep = " - ")
First, there are many observations in the recycling dataset with the month field not populated. As the problem appears about the occur uniformly across boroughs and zones, we have opted to simply show the NA data in our later graph.
As mentioned previously, the borough field was not populated for most observations in the key building energy use dataset, as shown below:
The solution for this missing data was discussed previously.
In addition,
#sum(is.na(energy_efficiency_project_data$*CO2 col*))
#^^THIS MAKES MARKDOWN BARF
We begin our analysis with a basic measure of how much a community cares about the environment - how well it recycles.Specifically, we consider the diversion rate (recycled waste/total waste) of every Zone (a subgroup of a borough) documented. Unfortunately, New York City is very lacking in this measure.
Notice the flat horizontal line above the others. This is not marking the diversion rate for an NYC Zone - it is a comparison point. Specifically, it the 2019 diversion rate for Virginia. Obviously every zone in the city is falling well short of this standard.
It should be noted that a city and state may have different obstacles for their recycling; however, it should also be noted that NYC and VA are virtually the same size. New York’s population is only 200,000 people fewer than Virginia’s according to the census bureau’s website. Further, New York is one of the most well-resourced cities in the world. Simply put, we should not be falling this short. Several municipalities in Virginia have implemented single stream recycling, and the state as a whole is now outperforming. Perhaps such a policy could help.
While recycling stood out as an easy entrance point to understanding New York’s current sustainability, this is, of course, a very incomplete picture. We next decided to consider perhaps the most important measure of an area’s environmental impact = Greenhouse Gas Emissions (GHG).
We begin by considering the largest sources of GHG in New York State over time:
The elevated levels of emissions caused by transportation immediately jump out. This suggests that cars, trucks, and other motor vehicles are some of the biggest local culprits.
Interestingly, this pattern does not fully hold when considering just New York City, as stationary fuel burners represent a larger problem here:
#TODO - WE SHOULD ACTUALLY ADD UP STATIONARY SHIT IN THE FIRST ONE (E.G. RESIDENTIAL, INDUSTRIAL, AND MAKE SURE THIS IS ACTUALLY A BIG DIFF)
Here, it is important to note two crucial limitations of our data. Firstly, the data set for NYC emissions only represents 1 year - 2016. It is possible that we were unlucky, and this is an outlier. Secondly, NYC Open Data does provide methodology for measurement. This is quite important for this particular question. New York is a city with many commuters, and it is unclear if emissions from cars driven by people who do not live in the city but work in it are included.
Still, caveats aside, the dip in the importance of transportation (IS THER ONE???) does coincide with zooming in on a city that prioritizes walking and public transportation, suggesting that commuter driving may be a very important subgset of transportation emissions.
On the other hand, the plot also demonstrates the clear importance of buildings when considering the city’s GHG emissions, warranting a closer look at building emissions in the city. As a first step, we consider the locations of our worst emitters:
While the 11370 zip code in Queens stands out as a particularly high emitter, Manhattan appears to be the borough with the most highly damaging buildings.
Of equal value is a better understanding of the types of buildings that are most often responsible for high emissions - shown below. Prior to considering the graph, we note that results here are filtered. Specifically, we considering only buildings with GHG emissions of 2,952 metric tons or fewer. This threshold was landed on after several rounds of graphing made it clear that anything above this threshold skewed the graph in such a way as to make it difficult to read. The buildings kept represent over 96% of all observations.
Three building type’s distributions of emissions jump out as particularly high - Wholesale Club/Supercenter, Prison/Incarceration, and Other - Utility. It is unclear what the latter references. Regarding prison, it seems fairly straightforward that a compound filled with many, many people and complex security systems would have high energy usage. The Club/Supercenter building is perhaps most interesting when considered as a pair along with Ice/Curling Rink - another fairly high distribution. It may be the case that large cooling efforts in the city are particularly bad for the environment.
Before drawing any concrete conclusions, we should, however, point out some issues with our data. This is a data set for 1 year (2019), so there is a risk that it was an aberrant year. Perhaps more importantly, as a reminder, these are only buildings covered by Local Law 84, which is based on square footage thresholds. This means that we are likely only considering the worst emissions buildings throughout the city. It’s possible that our understanding of the locales and building types most responsible for emissions would shift if we had a more complete picture.
If we were to estimate New York’s building overall, though, we may be somewhat surprised by the results. We consider the following:
First - as Local Law 84 separates our dataset based on square footage, we check whether this has a significant role in GHG emissions:
While some zero values may skew the overall picutre, there does appear to be a generally positive correlation between the two. As such, in order to estimate total emissions, we take the sum of all documented GHG emissions by the the sum of all of documented square footage - this gives us a metric of GHG emissions per square foot. The calculation gives about 0.0056 metric tons of GHG emissions per square foot.
Next, we apply this to the total square footage documented in land use data set. This dataset (with roughly 30 times as many buildings as the energy use dataset) has 5,584,887,007 total square feet documented in New York City. This comes out to a grand total of an estimate 31,099,387 metric tons of GHG emissions from New York City in 2019. To put this number into context, Google Sustainability Initiative estimates that in the same year San Francisco emitted 4,360,000 metric tons of GHG. Obviously, New York’s number is much larger. However, New York’s population is also about 9.46 times larger than San Francisco’s. So, while our estimated(!) GHG emissions are 7.13 times larger San Francisco’s, on a per person basis each of us actually responsibility for about 75.4% of the building emissions of one San Franciscan.
Having taken stock of the current state of New York’s environmental impact, we shift our focus towards potential solutions.
We
Given the importance of transportation emissions statewide, and the decrease in its importance in a less car-dependent area, we begin with a consideration of on-road vehicles. Specifically, we review electric vehicles.
To begin, we compare the United States’ overall EV adoption rate to the rest of the world.:
## Warning: Ignoring unknown parameters: right
## Warning: Removed 4 rows containing missing values (position_stack).
While the volume of EVs purchased over time is mostly a function of population size, rate of growth does not necessarily need to be. And the United States is clearly lacking in this area (especially as compared to China).
In order to understand if New York is a part of this trend or a bright spot, we consider the state’s current rebate program for Electric Vehicles. Similar to the country at large, we see slow adoption, even with the rebate. However, whether this is due to lack of interest is difficult to say. Beginning in late 2018, the number of vehicles purchased via the rebate has consistently outperformed expectations, and the rules of the rebate are such that not all people above this planned threshold will receive it. This could suggest that there is perhaps not an overwhelming natural desire EVs, but a healthy one for EVs with a rebate. An increased cap could potentially spur more major growth in this area.
Also of note is the increase in emissions reductions via the program:
Similarly to the rebate participants, since late 2018 actual emissions reductions have exceeded planeed reductions. Of extra note, however, is that actually reductions appear to exceed planned by a larger margin that rebate participation. This suggests that, while access to the rebate may be a motivating factor, a desire the lowest emission vehicles available may be as well.
Cumualtive rebate $
I DON’T HAVE COMMENTARY FOR THIS/I THINK IT’S BASICALLY THE SAME AS THE ACUTAL VS. PLANNED LARRY
quarter rebate $ I DON’T HAVE COMMENTARY FOR THIS
Zooming out, we consider rebate amounts each year of the program: We see a somewhat linear growth pattern. While a simple line would certainly not fit the data perfectly, it does fit well enough for rough estimates of potential future usage.
Fitting a regression line to the predict rebate usage moving forward, we see that the program will run out between 2021 Q4 and 2022 Q4 (at the intersection of the red line):
This suggests that current policy could be holding the state back from a more environmentally friendly endpoint. Perhaps increasing the rebate could make this possible.